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Abstract — Application of the turbo principle to multiuser 
decoding results in an exchange of probability distributions 
between two sets of constraints. Firstly, constraints imposed by 
the multiple-access channel, and secondly, individual constraints 
imposed by each users' error control code. A-posteriori proba- 
bility computation for the first set of constraints is prohibitively 
complex for all but a small number of users. Several lower 
complexity approaches have been proposed in the literature. 
One class of methods is based on linear filtering (e.g. LMMSE). 
A more recent approach is to compute approximations to 
the posterior probabilities by marginalising over a subset of 
sequences (list detection). Most of the list detection methods are 
restricted to non-singular systems. In this paper, we introduce a 
transformation that permits application of standard tree-search 
methods to underdetermined systems. We find that the resulting 
tree-search based receiver outperforms existing methods. 

I. Introduction 

It is well known that joint decoding can improve perfor- 
mance in multiple-access systems. Joint maximum likelihood 
(ML) decoding, which minimizes the overall probability of 
error is however prohibitively complex [1]. Brute force com- 
putation of the jointly ML codeword sequences for K users is 
0(Q Kk ) for Q-ary modulation and constraint length k codes. 

The good performance and low complexity of the turbo 
decoder [2] led to application of the turbo principle to joint 
multiuser decoding. Figure 1 shows a schematic representation 
of the "canonical" iterative multiuser decoder [3-6]. This 
decoder treats the users' forward error correction codes as 
an "outer code" and the interdependency introduced by the 
multiple access channel as an "inner code". The decoder 
iterates between a-posteriori probability (APP) computation 
for the inner code and individual APP decoding of each user's 
FEC code. The multiuser APP computation is 0(Q K ), an 
improvement over joint ML decoding, but still prohibitive. 

One low complexity alternative is to replace the inner APP 
decoder with a linear filter. Examples include soft interference 
cancellation [7, 8] and linear minimum mean-squared error 
filtering [9]. These approaches can work quite well, but there is 
still room for improvement compared to the exact computation 
of the multiuser APP. 

A more powerful approach is to compute an approximation 
of the multiuser APP by marginalizing over a subset of se- 
quences (in many cases only a small subset is required), found 
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using various different list detectors [12-19]. Most of these 
list-detection based methods rely on Cholesky decomposition 
of the correlation matrix as a first step. In underdetermined 
systems (more users than signaling dimensions), this decom- 
position cannot be performed. First steps towards avoiding 
this problem have been made in [20,21], based on filtering, 
followed by lattice reduction. 

The main contribution of this paper is a simple transforma- 
tion which creates a virtual full rank system, which permits 
Cholesky decomposition and the straight-forward application 
of tree-search methods in underdetermined multiuser systems. 
The method requires less computational complexity that the 
one described in [20,21]. Numerical results demonstrate the 
superior performance of the approach, which is compared to 
other techniques from the literature. 

II. System Model and Canonical Decoder 

Consider a multiple-access system with K radio terminals, 
or users, simultaneously transmitting forward error correction 
coded digital data across an additive white Gaussian noise 
(AWGN) channel. The encoder for user k = 1,2,..., K 
operates as follows. A length / frame of independent equi- 
probable information bits is encoded by a rate Be code Ck- 
The I /Be coded bits c,t are then permuted with the interleaver 
Ilfc, and parsed into length log Q segments. These segments are 
mapped onto a stream of I /Be logQ constellation symbols 
according to some memoryless mapping, and then multiplexed 
onto the symbol sequences of length I/Bc\ogQ. Each user 
transmits at a rate of Be log Q bits per channel use. 

A data vector d = (d\,--- ,cIk) T 6 P ,K represents all 
users' symbols in a given symbol interval (assuming symbol 
synchronous transmission for simplicity of explanation). The 
complex constellation PcC has \T>\ = Q unique elements, 
with moment constraints E [d] = 0, and E [dd*] = PIk, and 
symbols are equiprobable. The average transmit power per user 
is P. 

Each symbol is multiplied by a length L modulation vector 
Sfc, which has real random elements chosen uniformly from 
±.l/\/L. A vector z e C L with independent white zero- 
mean Gaussian element represents thermal noise with variance 
a 1 per real dimension. In a coded system where each user 
employs a rate B code and transmits with power P, the 
appropriate signal-to-noise measure we will use is E^/Nq = 
P/2a 2 BlogQ. 



We assume that each user's signals are received with identi- 
cal power, phase and delay, although these are not fundamental 
restrictions imposed by the proposed receiver. After standard 
manipulations the multiple-access channel may be represented 
by 



r = Sd + z 



(1) 



where S = (si, - • • ,Sk) € {±1/ VL} LxK . We only consider 
the case that K > L, ie. the number of users exceeds the 
number of independent observations. 

The canonical iterative decoder is shown in Fig. 1. The goal 
is to infer the value of u fe , k = 1, ■ ■ ■ , K, based on r, S and the 
constraints C k . The module labelled Multiuser APP computes 
the marginal posterior probability matrix u)(d) € pQx-f^ 
which has as columns the probability mass functions for the 
corresponding symbols, based on all the available information 
using the constraint (1), as well as the prior probability 
matrices w (d). This inner decoder is the focus of this work. 

The other constraint, separated from the first by an inter- 
leaver, is the single-user decoders, which calculate extrinsic 
probabilities iv e (c k ) based on the codes and uj a (c k ) for all 
k. The process repeats by iteratively exchanging information 
in the form of extrinsic probability matrices between the two 
modules. The individual APP decoders also compute, on the 
final iteration, the data sequence probabilities w(ufc). 
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Fig. 1. Iterative joint- APP multi-user receiver. 

The computation of the marginal symbol posteriors in 
the APP detector entails marginalising the joint probability 
p(d, r|r, S, A*o) over each possible sequence d. Brute force 
marginalisation for each user is therefore a summation with 
Q K ~ X terms, clearly impractical for all but small numbers of 
users. In practice however, nearly all of the probability for 
systems of interest is contained in a relatively small subset 
of V of those terms [10], and provided those terms can 
be isolated, the marginalisation can become computationally 
tractable. This is the idea goes back to [11] (in an uncoded 
context) and has seen a recent revival in the framework of 
iterative processing [12-19]. The goal of the next section is 
to approximate this sum with greatly reduced complexity. 



III. Approximation of the Multiuser APP 

The posterior log joint-probability of a particular hypothesis 
sequence d' is equal to 



Iogp(r, d'|S, N ) = c - i- ||r - Sd'|| 2 + logp(d') (2) 
i\ 



where c is a constant and p(d') is the prior probability of the 
sequence. Expand the squared-distance term as 



Sd'll 



r r 



23?e{r*Sd'} + d'* 
c + $Re{yd'} + !|Sd'|| 2 



Sd'l 



(3) 



where y = — 2r*S. In order to simplify the search for 
sequences d' that minimise (3), a recursive expression in 
d[, ■ ■ ■ ,d' K may be obtained if G = S*S is positive-definite. 
This cannot be the case when K > L. In our model S is not 
even guaranteed to have rank L. In order to obtain an equiva- 
lent full-rank system, we exploit the following representation. 
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where p k £ K is a free parameter. The terms in brackets define 
the columns of a new matrix, and a sufficient condition for 
that matrix to be positive-definite is that each term is positive 
irrespective of d. The following procedure is used to transform 
the log-likelihood into an additive recursive metric with K 
terms. This technique, which we believe to be new, is the 
main contribution of this paper. 

1) Choose a positive constant p e M+ satisfying 



p>(K 



1 ) max • 



(4) 



where T>\ , • • • , Vq are the elements of V. 

2) Construct the vector u = diag(G) — pi e R K , where 
1 is the all-ones vector. 

3) Construct a new matrix G by setting all diagonal ele- 
ments of G equal to p. The matrix G is guaranteed to 
be positive-definite. 

4) Compute the factorisation T*T = G, where T e 
R KxK is lower-triangular. 

By setting the prior term C(d') = —No logp(d'), and assum- 
ing statistical independence of the prior symbol probabilities 
due to the interleaver, (2) may be written as 



K 



k=i 



-No logp(r, d'|S, N ) = c + J2 ^{y k d' k }+ 

k 

^Ujdj 



(5) 



+ C{d' k ) + u k \d k \ 



For constant energy symbol constellations such as Q-ary PSK, 
the last term in (5) is absorbed into c, and we only require 

p>{K-l). 

Quadratic forms such as (5) admit a tree representation [11]. 
The equation represents a Q-ary tree of depth K, where each 
of the Q k nodes at depth k represent a partial sequence 
with an associated positive path weight, and the leaf nodes 
represent sequences d' with total path weight equal to c — 
N logp(d', r|S, No). Hence, the problem reduces to finding 
the V leaf nodes in the tree with minimum weight, which may 
be approximated using tree search techniques. 

The simple manipulations applied to G described above 
artificially create a virtual full rank channel from a rank- 
deficient one, assigning a positive path weight to every node 
in the tree and allowing sequential search to be applied to 
(5). This is not to say that extra information is obtained about 
the symbols via the transformations described; only that the 
information about the interfering signals is spread out onto a 
greater number of effective observations, so that any sequential 
search techniques developed for a full rank channel may also 
be applied in the overloaded or singular case. 

A transformation for overloaded linear systems was pre- 
sented in [20,21], which similarly creates a virtual full-rank 
system to which the full Q-ary tree may be assigned. The 
approach is based on a minimum mean square error gener- 
alised decision feedback equaliser filter, followed by lattice 
reduction, column re-ordering, and then triangular factorisation 
(if tree/sphere decoding is used). These transformations are 
significantly more complex than our procedure, and may 
be unsuitable for time-varying channels. The approach also 
colours the noise, so that the system no longer lends itself 
naturally to the iterative APP framework. 

A depth-first tree-search was used in [14,15,18], which 
necessitated special treatment of the prior probability on those 
paths that did not reach full depth. We propose a breadth-first 
search using the T-algorithm. The T-algorithm was used in 
[22,23] for near-optimal hard-decision decoding of channel 
codes up to a pre-determined minimum-distance with signifi- 
cant complexity savings over the Viterbi algorithm. The related 
M. -algorithm retains exactly M. paths at each depth, regardless 
of the actual weights of each partial path (this approach was 
used in [12]). In practice the statistical nature of the noise and 
the spreading sequences for each transmission may require 
a different number of sequences to approximate the APP. It 
should also be noted that any other tree search algorithm 
could be used, with slightly varying levels of performance 
and complexity. The key step is (5), which admits such tree 
representations for overloaded systems. 

We exploit the heuristic observation that paths with very 
large partial path weight are unlikely to be components of 
low weight paths. Rather than retaining a fixed number of 
paths at each depth, the T-algorithm attempts to adapt to 
the channel conditions by only retaining paths at each depth 
with weight not exceeding the best weight by more than T, 
where T is a parameter of the algorithm. When the algorithm 
terminates at the leaves, the best V sequences are used in the 



marginalisation. 

At low SNR in the early iterations, or with few receiver 
observations compared to the number of transmitters, large 
numbers of paths will exist with similar path weight. Due to 
complexity constraints in this scenario the number of retained 
paths at each depth must be limited to V mix , and the algorithm 
essentially becomes the .M-algorithm with Ai = "P max . In 
more favourable circumstances however, very few paths are 
required and the T-algorithm adapts automatically to take 
advantage of the conditions with greatly reduced complexity. 

When no prior information is available the T-algorithm 
adapts very well to the channel, automatically finding a good 
performance/complexity trade-off through the parameter T. 
As a general rule, the T-algorithm only tends to approach 
the "P max bound in the early iterations, since the search is 
greatly facilitated by the prior probabilities once they become 
available. When very strong prior information is available 
however, the only sequences retained will be those dictated 
by the priors, since other paths will be discarded in the early 
depths. In this case the detector will glean little new infor- 
mation, and the information about the symbols will quickly 
become correlated over iterations. Hence, another parameter 
■Pmin must also be set, forcing the algorithm to consider a 
certain minimum number of sequences at each depth. The 
effect is only significant in highly loaded systems where many 
iterations are required for convergence. 

The T-algorithm finds full-depth paths through the tree, and 
the prior probabilities are incorporated in a natural fashion. In 
contrast, the depth-first strategy of [14, 15] required special 
handling of the prior probabilities, while the method of [18] 
required an initial breadth-first search in order to exploit 
the priors during the main search. The M -algorithm based 
approach used in [12, 13] did not directly incorporate priors 
(this was done in a separate combining step). Other approaches 
incorporate the prior probability into spherical or branch-and- 
bound decoders in various ways [16, 17, 19], but tend to be 
quite complex and unsuitable for large-dimensional systems. 

The receiver complexity is dominated by Cholesky factori- 
sation of the K x K matrix G, and then by the tree search 
during the iterations. The complexity of the T-algorithm is 
upper bounded by KV mix node computations per iteration, 
but this bound only ever tends to be reached in the early 
iterations, as discussed above. Contrast this for example with 
the LMMSE filter [9] as the inner detector, which requires an 
initial matrix inverse, and then a matrix inverse per user on 
each subsequent iteration. 

IV. Numerical Results 

In this section we consider a benchmark model, with length 
L = 8 random PN spreading sequences and no fading. The 
model is difficult to work with, since a significant probability 
exists that the spreading matrix S will have linearly dependent 
rows. 

The individual users transmit BPSK symbols, which are 
encoded with a nonsystematic 4-state rate 1/2 convolutional 
code, described by the feed forward generator polynomials 



(05, 07). A length 21 = 1000 interleaver between the encoder 
and the transmitter is generated randomly for each user. 

We consider joint iterative decoding of the system using 
the T-algorithm, where V m!a is set to 512 and the threshold 
T is set to I6./V0. The bound "P max is deliberately set large 
in order to demonstrate the performance advantage of closely 
approximating the APR The T-algorithm for the overloaded 
case is furnished by the matrix manipulations proposed in 
Section III for computing the log-likelihood. 
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Fig. 2. Comparative CDMA system BER performance as a function of 
number of users K. Spreading gain L = 8, E^/Nq = 5 dB. 

The performance of the T-algorithm in the above model 
is shown as a function of the number of users in Figure 2, 
at Eb /N = 5 dB. Also shown is the performance of two 
linear filters commonly used as the multi-user detector in such 
systems, the PIC [8] and the LMMSE [9] filters. The parameter 
V min for the T-algorithm is set to 32 for K < 16, T min = 64 
for K = 17, 18, and T min = 128 for K = 19, 20 users. These 
values were found by experiment to be sufficiently large for 
the loads considered. 

While very computationally efficient, the PIC can only 
support 9 users after 20 iterations, and is clearly not suitable 
for highly loaded systems. The MMSE filter performs better, 
supporting 14 users after 20 iterations, but requires a matrix 
inversion per user per iteration. 

Estimating the detector APP is a highly non-linear cal- 
culation, and the linearised models and assumptions used 
by the above filters are not necessarily valid in the model 
under consideration. The performance of the T-algorithm in 
Figure 2 clearly demonstrates the performance advantage of 
approximating the APP directly using the rules of probability. 
List-detection using the T-algorithm supports 16 users with 
only 5 iterations and 19 users after 20 iterations at 5 dB. 
To our knowledge we have not seen loads in such a system 
approaching those achieved here. 

In Figure 3 is shown the performance of list detection 
as a function of Eb/N , for various number of users, after 
20 receiver iterations. Note that without the log-likelihood 
transformation of Section III, the tree search with 20 users 
would require at least one stage with 2 K ~ L = 4096 node 



computations, assuming the best case that S has rank L. 
An extrinsic-information transfer chart [24] shows that the 
T-algorithm detector is very well matched in shape to the 
particular code, which helps to explain the good performance 
after many iterations, even at very high loads. The charts, 
which we do not include here, also predict very accurately 
the convergence characteristics shown in Figures 2 and 3. 




^ 10 1 single-user bound 



E b /N 

Fig. 3. CDMA system BER performance after 20 iterations as a function 
of SNR. Spreading gain L = 8. 

Figure 4 shows the spectral efficiency of the receiver for 
the system described above, as a function of Eb/N n , mea- 
sured as the maximum number of users for which single- 
user performance is reached. Also shown is the maximum 
spectral efficiency C achievable by using both an optimal 
joint receiver, and an MMSE detector followed by single-user 
decoding. These curves were approximated using the large- 
systems expressions for random spreading given in [25] under 
the constraint C = KR/L with R = 1/2. 

The receiver easily approaches optimal joint-processing data 
rates at low SNR, but cannot maintain this slope with in- 
creasing SNR. Nevertheless, the iterative T-algorithm receiver 
outperforms any other practical algorithm we are aware of in 
terms of system load for the given channel model. Various 
other results for randomly spread CDMA in AWGN using 
a rate 1/2 code are available in the literature, utilising the 
same canonical receiver structure but differing in the multi- 
user detector implementation. These are shown in Figure 4, 
along with references to the relevant papers. 

V. Conclusion 

We have shown that near-optimal performance may be 
achieved with low complexity in a randomly spread CDMA 
channel by employing the turbo principle in an iterative 
receiver. This is not a new observation; our contribution is 
to show that by attempting to calculate the true symbol-APP 
distributions in the inner detector, the performance is signif- 
icantly improved over detectors that employ linear filters, or 
other structures derived using alternative considerations. Close 
approximation of the desired APP distributions in overloaded 
or singular channels is practically facilitated by the simple 
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Fig. 4. Maximum spectral efficiency achieved by the optimal receiver, one- 
shot LMMSE filter, and the iterative list-detection receiver for the system 
described above, as a function of Ej/JVo. Code rate R = 1/2. Also shown as 
crosses are rates achieved in the randomly spread CDMA channel by receivers 
described in the literature. The primary work describing each receiver is 
referenced directly on the plot. 



procedure of Section III, and by the application of simple and 
well-known sequential algorithms. 

References 

[1] T. R. Giallorenzi and S. G. Wilson, "Multiuser ML sequence estimator 

for convolutionally coded asynchronous DS-CDMA systems," IEEE 

Trans. Commun, vol. 44, no. 8, pp. 997-1008, Aug. 1996. 
[2] C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit 

error-correcting coding and decoding: Turbo-codes (1)," in Proc. IEEE 

Int. Commun. Conf., Geneva, 1993, pp. 1064-1070. 
[3] Mark C. Reed, Christian B. Schlegel, Paul D. Alexander, and John A. 

Asenstorfer, "Iterative multiuser detection for CDMA with FEC: Near 

single user performance," IEEE Trans. Commun, vol. 46, no. 12, pp. 

1693-1699, Dec. 1998. 
[4] M. C. Valenti and B. D. Woerner, "Iterative multiuser detection for 

convolutionally coded asynchronous DS-CDMA," in IEEE Int. Symp. 

Personal, Indoor, Mobile Radio Commun., Boston, USA, Sept. 1998, 

pp. 213-217. 

[5] Michael Moher, "An iterative multiuser decoder for near-capacity 
communications," IEEE Trans. Commun, vol. 46, no. 7, pp. 870-880, 
July 1998. 

[6] Michael Moher and T. Aaron Gulliver, "Cross-entropy and iterative 
decoding," IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3097-3104, 
Nov. 1998. 

[7] J. Hagenauer, "Forward error correcting for CDMA systems," in IEEE 
Int. Symp. Spread Spectrum Techn. App., Mainz, Germany, 1996, pp. 
566-569. 

[8] P. D. Alexander, A. J. Grant, and M. C. Reed, "Iterative detection 

on code-division multiple-access with error control coding," European 

Trans. Telecomm., vol. 9, pp. 419^126, 1998. 
[9] X. Wang and H. Poor, "Iterative (turbo) soft interference cancellation 

and decoding for coded CDMA," IEEE Trans. Commun., vol. 47, pp. 

1046-1061, 1999. 

[10] A. P. Kind and A. J. Grant, "On estimating the symbol APP in MIMO 
systems," in Proc. IEEE Int. Symp. Inform. Theory, Chicago, USA, 
June-July 2004. 

[11] L. Wei, L. Rasmussen, and R. Wyrwas, "Near optimum tree-search 
detection schemes for bit-synchronous multiuser CDMA systems over 
Gaussian and two-path Rayleigh fading channels," IEEE Trans. Com- 
mun., vol. 45, no. 6, pp. 691-700, 1997. 

[12] A. B. Reid, A. J. Grant, and P. D. Alexander, "List detection for multi- 
access channels," in Proc. IEEE GLOBECOM '02, 2002, vol. 2, pp. 
1083-1087. 



[13] A. B. Reid, A. J. Grant, and A. P. Kind, "Low-complexity list-detection 

for high-rate multiple-antenna channels," in Proc. Int. Symp. Inform. 

Theory, Yokohama, Japan, 2003, p. 273. 
[14] J. Hagenauer, "A soft-in/soft-out list sequential (LISS) decoder for turbo 

schemes," in IEEE Int. Symp. Inform Theory, 2003, p. 382. 
[15] S. Baro, J. Hagenauer, and M. Witzke, "Iterative detection of MIMO 

transmission using a list-sequential (LISS) detector," in Proc. ICC '03, 

2003, pp. 2653-2657. 
[16] B. M. Hochwald and S. ten Brink, "Achieving near-capacity on a 

multiple-antenna channel," IEEE Trans. Inform. Theory, vol. 51, pp. 

2764-2772, 2003. 

[17] J. Boutros, N. Gresset, L. Brunei, and M Fossorier, "Soft-input soft- 
output lattice sphere decoder for linear channels," in IEEE Global 
Communications Conference, Dec. 2003, pp. 1583-1587. 

[18] C. Kuhn and J. Hagenauer, "Iterative list-sequential (LISS) detector for 
fading multiple-access channels," in GLOBECOM, 2004, pp. 330-335. 

[19] H. Vikalo, B. Hassibi, and T. Kailath, "Iterative Decoding for MIMO 
Channels via Modified Sphere Decoder," IEEE Transactions on Wireless 
Communications, vol. 3, no. 6, pp. 2299-2311, Nov. 2004. 

[20] M.O. Damen, H. El Gamel, and G. Carre, "MMSE-GDFE lattice 
decoding for under-determined linear channels," in Proc. 38th Annual 
Conf. on Inform. Sciences and Systems, March 2004. 

[21] M.O. Damen, H. El Gamel, and G. Caire, "MMSE-GDFE lattice 
decoding for solving under-determined linear systems with integer 
unknowns," in Proc. IEEE Int. Symp. Inform. Theory, June- July 2004, 
p. 538. 

[22] J. B. Anderson, "Limited search decoding of convolutional codes," IEEE 

Trans. Inform. Theory, vol. 35, no. 5, pp. 944-955, September 1989. 
[23] S. J. Simmons, "Breadth-first trellis decoding with adaptive effort," 

IEEE Trans. Commun., vol. 38, no. 2, pp. 3-12, January 1990. 
[24] S. ten Brink, "Convergence of iterative decoding," IEEE Electron. Lett., 

vol. 35, pp. 1117-1119, 1999. 
[25] S. Verdu and S. Shamai, "Spectral efficiency of CDMA with random 

spreading," IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 622-640, 

March 1999. 

[26] L. K. Rasmussen, A. J. Grant, and P. D. Alexander, "An extrinsic Kalman 

filter for iterative multiuser decoding," IEEE Trans. Inform. Theory, vol. 

50, no. 4, pp. 642-648, April 2004. 
[27] P. H. Tan and L. K. Rasmussen, "Multiuser detection based on Gaussian 

approximation," in Proc. Workshop on Telecomm. Internet and Signal 

Proc, Adelaide, Australia, December 2004. 



